Add pipeline queue monitoring with Prometheus metrics by Copilot · Pull Request #497 · internetarchive/Zeno

Copilot · 2025-09-23T08:26:21Z

This PR implements comprehensive queue monitoring for Zeno's processing pipeline, allowing operators to identify bottlenecks by tracking the queue sizes of each component through Prometheus metrics.

Overview

The implementation adds real-time monitoring of all pipeline channels with both Prometheus and TUI integration. This enables operators to see which components are slowing down the crawl by having the largest queues.

Key Features

Pipeline Component Tracking

All 8 pipeline components are now monitored with descriptive names:

reactor_to_preprocessor - Items flowing from reactor to preprocessor
preprocessor_to_archiver - Items flowing from preprocessor to archiver
archiver_to_postprocessor - Items flowing from archiver to postprocessor
postprocessor_to_finisher - Items flowing from postprocessor to finisher
finisher_to_source - Finished items returned to source (HQ/LQ)
source_to_finisher - New items from source to finisher
hq_outlinks_finish - HQ outlinks finish channel
hq_outlinks_produce - HQ outlinks produce channel

Prometheus Metrics Integration

Component queue sizes are exposed as Prometheus metrics with the following format:

# HELP zeno_component_queue_size Size of component queues in the processing pipeline
# TYPE zeno_component_queue_size gauge
zeno_component_queue_size{component="reactor_to_preprocessor",hostname="host",project="proj",version="ver"} 0
zeno_component_queue_size{component="preprocessor_to_archiver",hostname="host",project="proj",version="ver"} 2
zeno_component_queue_size{component="archiver_to_postprocessor",hostname="host",project="proj",version="ver"} 1
...

TUI Integration

Queue sizes are now displayed in the terminal interface as part of the stats panel, updating in real-time every 500ms alongside existing metrics.

Usage Examples

# Start crawl with Prometheus metrics
./Zeno get url --api --prometheus --api-port 9000 https://example.com

# Query component queue sizes
curl http://localhost:9000/metrics | grep component_queue

# Use with Prometheus/Grafana for monitoring and alerting
# Use TUI for live monitoring
./Zeno get url --tui https://example.com

Technical Implementation

Thread-safe: Uses RWMutex for concurrent access to channel tracking
Zero overhead: Minimal performance impact when not actively monitoring
Dependency injection: Testable design with proper separation of concerns
Real-time: Queue sizes calculated using len(channel) for instant feedback
Prometheus integration: Updates every 250ms alongside other stats

Benefits

Bottleneck Identification: Quickly identify which components are slower than others
Performance Optimization: Data-driven capacity planning and tuning
Monitoring Integration: Native Prometheus metrics enable integration with standard monitoring tools
Historical Analysis: Time-series data for trend analysis and alerting

This implementation provides pipeline visibility for operators to identify and resolve performance bottlenecks effectively using standard monitoring infrastructure.

Fixes #471.

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

codecov-commenter · 2025-09-23T08:38:07Z

Codecov Report

❌ Patch coverage is 72.72727% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 56.42%. Comparing base (1f0d58f) to head (ca40c1f).
⚠️ Report is 50 commits behind head on main.

Files with missing lines	Patch %	Lines
internal/pkg/stats/methods.go	16.66%	4 Missing and 1 partial ⚠️
internal/pkg/stats/prometheus.go	0.00%	5 Missing ⚠️
internal/pkg/stats/stats.go	0.00%	5 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #497      +/-   ##
==========================================
+ Coverage   56.39%   56.42%   +0.03%     
==========================================
  Files         130      131       +1     
  Lines        8120     8167      +47     
==========================================
+ Hits         4579     4608      +29     
- Misses       3172     3188      +16     
- Partials      369      371       +2

Flag	Coverage Δ
e2etests	`40.64% <47.27%> (-0.01%)`	⬇️
unittests	`29.52% <49.09%> (+0.20%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Co-authored-by: CorentinB <5089772+CorentinB@users.noreply.github.com>

CorentinB · 2025-09-23T09:14:42Z

@copilot it wasn't asked to do a warcprox-compliant endpoint. Add stuff like component_queues informations in the Prometheus endpoint instead.

Co-authored-by: CorentinB <5089772+CorentinB@users.noreply.github.com>

Copilot · 2025-09-23T09:21:56Z

@copilot it wasn't asked to do a warcprox-compliant endpoint. Add stuff like component_queues informations in the Prometheus endpoint instead.

Implemented as requested. Removed the warcprox-compliant /status endpoint and added component queue information to the Prometheus metrics instead.

The component queue sizes are now available as zeno_component_queue_size gauge metrics with component labels, updating every 250ms alongside other stats. This provides better integration with standard monitoring infrastructure.

Usage: curl http://localhost:9000/metrics | grep component_queue

Commit: ca40c1f

Initial plan

79f6d3f

Copilot AI assigned Copilot and CorentinB Sep 23, 2025

Copilot started work on behalf of CorentinB September 23, 2025 08:26 View session

Implement channel queue monitoring with API endpoint

9e0a621

Co-authored-by: CorentinB <5089772+CorentinB@users.noreply.github.com>

CorentinB force-pushed the main branch from 899d31b to 1f0d58f Compare September 23, 2025 08:41

Copilot AI changed the title ~~[WIP] Which components are the slowest?~~ Add pipeline queue monitoring with warcprox-compatible API endpoint Sep 23, 2025

Copilot AI requested a review from CorentinB September 23, 2025 08:42

Copilot finished work on behalf of CorentinB September 23, 2025 08:42

Copilot started work on behalf of CorentinB September 23, 2025 09:14 View session

Replace status endpoint with Prometheus component queue metrics

ca40c1f

Co-authored-by: CorentinB <5089772+CorentinB@users.noreply.github.com>

Copilot AI changed the title ~~Add pipeline queue monitoring with warcprox-compatible API endpoint~~ Add pipeline queue monitoring with Prometheus metrics Sep 23, 2025

Copilot finished work on behalf of CorentinB September 23, 2025 09:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pipeline queue monitoring with Prometheus metrics#497

Add pipeline queue monitoring with Prometheus metrics#497
Copilot wants to merge 3 commits intomainfrom
copilot/fix-471

Copilot AI commented Sep 23, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented Sep 23, 2025 •

edited

Loading

Uh oh!

CorentinB commented Sep 23, 2025

Uh oh!

Copilot AI commented Sep 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Key Features

Pipeline Component Tracking

Prometheus Metrics Integration

TUI Integration

Usage Examples

Technical Implementation

Benefits

Uh oh!

codecov-commenter commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

CorentinB commented Sep 23, 2025

Uh oh!

Copilot AI commented Sep 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Sep 23, 2025 •

edited

Loading

codecov-commenter commented Sep 23, 2025 •

edited

Loading